Analysis of over 1 million race records shows runners from East African countries as the fastest in 50-km ultra-marathons

The 50-km ultra-marathon is a popular race distance, slightly longer than the classic marathon distance. However, little is known about the country of affiliation and age of the fastest 50-km ultra-marathon runners and where the fastest races are typically held. Therefore, this study aimed to investigate a large dataset of race records for the 50-km distance race to identify the country of affiliation and the age of the fastest runners as well as the locations of the fastest races. A total of 1,398,845 50-km race records (men, n = 1,026,546; women, n = 372,299) were analyzed using both descriptive statistics and advanced regression techniques. This study revealed significant trends in the performance of 50-km ultra-marathoners. The fastest 50-km runners came from African countries, while the fastest races were found to occur in Europe and the Middle East. Runners from Ethiopia, Lesotho, Malawi, and Kenya were the fastest in this race distance. The fastest 50-km racecourses, providing ideal conditions for faster race times, are in Europe (Luxembourg, Belarus, and Lithuania) and the Middle East (Qatar and Jordan). Surprisingly, the fastest ultra-marathoners in the 50-km distance were found to fall into the age group of 20–24 years, challenging the conventional belief that peak ultra-marathon performance comes in older age groups. These findings contribute to a better understanding of the performance models in 50-km ultra-marathons and can serve as valuable insights for runners, coaches, and race organizers in optimizing training strategies and racecourse selection.


Statistical analysis
Histograms of the number of records and the average race speed by age/age group were visualized, displaying approximate Gaussian distributions.The variables 'athlete country' (the athlete's country of affiliation) and 'event country' (the country where the race took place) were used to rank the countries by average race speed by aggregating by the country columns and then sorting by average race speed.Records from countries with less than 10 records were removed from the set to reduce noise and ensure that the results were statistically representative.
The resulting dataset contained 1,398,845 race records from 549,154 unique runners from 122 countries, participating in 50-km races held in 86 countries worldwide between 1894 and 2022.The descriptive statistical data in the ranking tables includes the number of records and the mean, standard deviation (std), max, and min values of the race speed.
In addition to this, an XG Boost regression model was built, with the following variables used as predictors or inputs to the model: These variables are the encoded versions of the original variables ('athlete gender' , 'age group' , 'athlete country' , and 'event country').Athlete_gender_ID was encoded as 0 = female and 1 = male.Age_group_ID was encoded as the lowest value included in the age group ("18-24" becomes 18, "25-29" becomes 25, etc.).The Athlete_coun-try_ID and Event_country_ID variables were encoded as per the country's position in the descriptive country ranking tables.The predicted variable, or model output, was 'race speed' (km/h).Two evaluation metrics, MAE and R 2 , were calculated to assess the model's accuracy and behavior, along with the model relative features importance and prediction distribution plots.Following some basic hyper-parameter tuning, the model was trained and tested with the full sample (in-sample testing).
To further qualify the results, an MLR (Multivariate Linear Regressor) model and four individual ULR (Univariate Linear Regressor) models-all based on the OLS (Ordinary Least Squares) method were made.The results were then compared to the XGBoost model results to quantify the statistical importance of the variables.All data processing and analysis were done using Python (http:// www.python.org/) and a Google Colab notebook (https:// colab.resea rch.google.com/).

Ethical approval
This study was approved by the Institutional Review Board of Kanton St. Gallen, Switzerland, with a waiver of the requirement for informed consent of the participants as the study involved the analysis of publicly available data (EKSG 01/06/2010).The study was conducted in accordance with recognized ethical standards according to the Declaration of Helsinki adopted in 1964 and revised in 2013.

Model interpretability charts
The charts and plots presented in the Figs. 2, 3, 4, 5, provide a detailed visualization that combines a descriptive view of the full 50-km race sample with the predictive model insights.For each of the four predicting variables (age, gender, country of affiliation and country of event), a set composed of three charts is shown.A prediction distribution chart at the top as a boxplot chart with the 2nd quartile (median value) in the box label, a red line chart in the middle, representing the average race speed for each group, setting a target for the model prediction distributions, and a counting chart at the bottom showing the number of race records for each value of the predictor or group.For the ' Age group' and ' Athlete gender' predictors, all values were displayed.Still, for the ' Athlete country' and 'Event country' predictors, only the first 20 (the fastest 20) were displayed because of high cardinality (these match the top 20 countries in the ranking tables).

Evaluation metrics and features importance
The model for the 50-km race class exhibits an R 2 = 0.36 coefficient of determination value, which indicates a weak but existing association of the predicting variables with the model output.In terms of feature importance, 'Event country' was the most important predictor (66%), followed by ' Athlete gender' (23%), ' Age group' (7%), and ' Athlete country' (5%) (Fig. 6).This hierarchy underscores the relative impact of these factors on race performance, Figure 1.The number of runners and men-to-women ratio over the years.The MLR model achieved an R 2 = 0.325, which is only marginally worse than our XGBoost results.All four predictors contribute statistically significantly to the MLR model output with a P value of 0.000 in all cases.The ULR models showed, although statistically significant, little statistical importance for ' Athletes gender' with an R 2 = 0.025 and ' Age group' with an R 2 = 0.006, while 'Event country' proved significant with R 2 = 0.279 and Athlete country with R 2 = 0.260 indicating that the ' Athlete country' and the 'Event country' variables are nearly equally important when used individually, suggesting a high correlation between them (e.g.runners in each country events were mostly affiliated to that same country).

Discussion
The primary objective of this study was to investigate the country of affiliation of the fastest ultra-marathoners in the 50-km race category.Another aim was to identify the countries where the fastest 50-km ultra-marathon races are held and the age of the fastest runners participating.The main findings were (i) the fastest runners in the 50-km ultra-marathon originate from African countries (Ethiopia, Lesotho, Malawi, and Kenya), (ii) the countries with the fastest 50-km racecourses are in Europe (Luxembourg, Belarus, and Lithuania) and in the Middle East (Qatar and Jordan) and (iii) the age group 20-24 years showed the fastest 50-km ultra-marathon times.The results refute the authors' hypothesis since the fastest runners were from African countries, and the age group of the fastest 50-km ultra-marathon runners was younger than expected.

Runners from Ethiopia, Lesotho, Malawi, and Kenya are the fastest 50-km ultra-marathoners
The first finding was that runners from Ethiopia, Lesotho, Malawi, and Kenya were the fastest 50-km ultramarathoners.Several factors contribute to the prevalence of runners from East African countries in long-running events, such as marathons and ultra-marathons.These include a genetic predisposition, adherence to a traditional diet, living and training at high altitudes, and sociocultural background 28,29 .It is important to note that the country's infrastructure requires Ethiopians to daily walk or run with heavy school bags for long periods of time 30,31 .
For a significant period, there has been a prevailing suggestion that genetic background significantly influences sporting potential by determining the anthropometric, cardiovascular, and muscular characteristics contributing to adaptation during physical training 32 .This has suggested that runners from East African countries possess an inherent genetic advantage that predisposes them to superior athletic abilities 32 .Genetic studies conducted on elite African runners have not identified any unique genetic makeup; instead, they underscore the substantial genetic diversity among the general population and elite runners from East African countries 33 .Based on the available evidence, the subjects' phenotype, shaped by various factors over time, exerts a greater influence on their success in long-distance running than their genotype 34 .
However, Kenyan runners have been found to exhibit a significantly higher activity of the enzyme hydroxylacyl-CoA dehydrogenase, which plays a crucial role in generating energy from lipids 35 .This suggests that Kenyan runners may have a more efficient ability to derive energy from lipid sources than some of their competitors 35 .Currently, there is no available information regarding enzymatic activity among elite Lesothan, Malawian or Ethiopian distance runners.
Larsen et al. 36 examined the anthropometric characteristics of Kenyan distance runners, revealing that their legs were 5% longer compared to elite distance runners from Scandinavian countries.Additionally, the Kenyan runners had thinner and lighter calves, weighing 12% less when compared to runners from Scandinavian countries.Supporting these findings, Saltin et al. 37 demonstrated that Kenyan distance runners exhibited greater metabolic efficiency, particularly at race-pace running speeds, compared to runners from Scandinavian countries.These observations suggest that the inherent ectomorphic somatotype of elite Kenyan runners may contribute to their success on the track and roads by enhancing their biomechanical and metabolic efficiency.However, A study examining the dietary patterns of long-distance runners from Africa has revealed that they comply with most nutritional guidelines for endurance runners 38 .The traditional Ethiopian diet consists of 13% protein, 23% fat, and 64% carbohydrates 38 .The traditional Kenyan diet consists of 10% protein, 13% fat, and 77% carbohydrate 39 .The national dish of Lesotho is a fermented sorghum porridge 40 .Some staple foods include cornmeal porridge covered with a sauce consisting of vegetables 40 .The carbohydrate portion of the diet primarily consists of vegetables, fruits, rice, and unrefined sugar 40 .Malawi's culinary culture revolves around integral ingredients such as sugar, corn, potatoes, sorghum, and fish, including the staple food Nsima made from ground corn 41 .People from African countries have consumed these low-fat, high-carbohydrate diets for centuries, and their composition is consistent with research-based recommendations for endurance runners 29 .While these diets seem beneficial for training and excelling in middle-and long-distance running competitions, they do not appear to possess unique differences compared to the training diets of runners from other continents 29 .As a result, other factors beyond food play a significant role in determining athletic superiority, as these diets are unlikely to provide a significant distinctive competitive advantage.It is important to highlight that the highcarbohydrate diet maintains muscle glycogen but has a negative effect on high-intensity exercise performance 42 .However, that may not be an issue considering lower intensities as long-distance running.
Certain factors, such as total hemoglobin mass, may be influenced by the environment where elite runners from the Kalenjin people in Kenya and the Arsi people in Ethiopia live and train 29 .The Kalenjin and Arsi people have a long history of residing at higher altitudes ranging from 2000 to 2500 m 29 .In particular, Ethiopian elite runners originate from high-altitude areas exceeding 4000 m, with approximately 80% of the population residing at or above 2000 m 43 .Malawi's central plateaus, reaching 760 to 1370 m, cover approximately three-fourths of www.nature.com/scientificreports/ the entire land area 44 .Lesotho is the sole sovereign nation on Earth that exists entirely at an elevation surpassing 1000 m 45 .Consequently, its lowest point, reaching a remarkable altitude of 1400 m, is the world's highest among all countries.More than 80% of Lesotho's landmass resides at elevations exceeding 1800 m 46 .The environmental context of living and training at higher altitudes could potentially contribute to developing specific physiological characteristics, including total hemoglobin mass 29 .For many people of Ethiopia, Lesotho, Malawi and Kenya, running is a routine aspect of daily life, often utilized for transportation or as part of household chores, and children frequently start running at a young age as their main method of travel to school 43,47 .A theory suggests that long-distance runners may achieve a higher maximal oxygen uptake ( VO 2 max) due to their early exposure to extensive walking and running 29 .Again, this could explain their exceptional endurance-running performance in later years.
African countries, most prominent in Ethiopia and Kenya, have a strong running tradition, and many experienced coaches and trainers work with young runners to develop their skills and talents 28,48,49 .These countries have a well-established infrastructure for running, with numerous running camps and facilities that support the development of elite runners [48][49][50][51] .It is important to remember that extraordinary athletic achievements among specific populations undoubtedly result from the successful combination of numerous factors.
Considering the example of a marathon race that has ~ 8 km shorter distance than 50-km-races, with the exception of a study 52 , there has been a consensus that Kenyans and Ethiopians were the fastest runners.The www.nature.com/scientificreports/exception was the analysis of "World Athletics" fastest marathon runners from 1999 to 2015, which found that Latvians and Ethiopians were the fastest women and men, respectively 52 .On the other hand, a study of trends in the "New York City Marathon" from 2006 to 2016 as well as separate research of 50 years, showed that Kenyans and Ethiopians were the fastest 53 .This observation was confirmed in another popular American race, i.e., the Boston Marathon, analyzing data from 1972 to 2018 as well as from 1897 to 2017 54,55 .Moreover, these two East African nationalities were the fastest in the "World Marathon Majors" (Boston, Berlin, Chicago and New York) and the "Stockholm Marathon" from 2000 to 2014 56 , as and marathon races held in Switzerland from 1999 to 2014 confirmed this observation 57 .Thus, the findings of the present study in 50-km races agreed with those in marathon races, which might be explained by the high affinity of these two race distances.The models predictor The effect of the socioeconomic status of the participants should not be ignored.Due to the higher socioeconomic status and, in turn, a higher participation rate of athletes from Europe, Asia, or North America, compared to the participation rate of African countries, their sample will forcefully be more heterogeneous with a lower average speed, resulting in a comparable higher average speed of athletes from African countries.Further studies could attempt a stepwise analysis to gain more insight into the prediction strength of the athletes' country affiliation itself.Another significant finding was that the fastest mean race times were recorded in races held in Europe (Luxembourg, Belarus, and Lithuania) and in the Middle East (Qatar and Jordan).This result is highlighted by the highest output of the event country predictor.
Although Luxembourg has the fastest mean race times, it should be considered an outlier.The high mean race speed is due to the exceptionally high minimal race speed compared to the other countries.Based on the minimal race speeds, which elevate the mean race speed, we can assume that the participating runners were well above the average participant in other races.This effect should be considered a limitation of this study since a high distribution of lower-performing runners will skew the mean downward, as seen in the example of the www.nature.com/scientificreports/United States of America, which has the highest average race speed but is downgraded by the high number of slower participants.Upon that, for precise measures, events with mean race speeds should not be considered in future analysis.
All the races mentioned above share a common characteristic-they are held on flat courses with minimal elevation changes.The racecourses in Belarus (indoor) 58 , Lithuania (road race) 59 , Qatar (road race, flat trail race) 60 , and Jordan (road race) 61 are known for their flat terrain, which greatly contributes to achieving faster race times.These races offer smoother terrain and predictable conditions, allowing runners to maintain a steady pace without hindrance from inclines or steep descents.In addition, a study has demonstrated that flat terrain race results have been affected by the new advanced shoe technology 62 .
In contrast, trail running races are characterized by a sequence of off-road sections that involve uphill and downhill segments, resulting in significant physiological and mechanical changes 63,64 .Uphill sections involve prolonged and intense concentric muscle actions, while downhill sections require eccentric actions in the lower limb muscle-tendon unit 65 .These muscle actions and the duration of contractions differ from those in level road running, which primarily involve repetitive and continuous stretch-shortening cycles in the lower limb extensors 66 .In level road running, the upward and downward movements of the center of mass are generally balanced, along with the positive and negative external work within each step 67 .However, during incline running, the "bouncing" mechanism gradually diminishes as speed and slope increase 67 .On positive slopes, the step period decreases, and the body's downward movement is reduced, while on negative slopes, the step period increases, and the upward movement decreases 67,68 .Steep changes in slope also lead to noticeable alterations in ground reaction forces, including a decrease in normal impact force peaks and parallel braking force peaks, accompanied by an increase in parallel propulsive force peaks 68 .Consequently, the repeated variations in slope and the associated mechanical responses in trail running races are likely to influence the manner of muscular contraction and metabolic demands 69 .To sum up, flat terrain plays a crucial role in achieving faster race times by providing more predictable conditions that enable runners to maintain a steady pace 70 .This allows runners to sustain their rhythm throughout the race and optimize their energy usage.
It is important to consider that the racecourse alone does not determine the entire outcome of the race.Again, factors such as runners' preparation, training methods, nutrition, and individual capabilities also play significant roles.A combination of favorable racecourse characteristics and various other factors contributes to the overall faster race times observed in these countries.Although environmental factors like humidity and temperature can influence performance, this study did not include them in its analysis because of the unreliable and incomplete data for the analyzed events.

The fastest 50-km ultra-marathoners are in the age group 20-24 years
An unexpected finding was that the fastest 50-km ultra-marathoners were in the age group 20-24 years.Typically, the age range when the fastest ultra-marathon race times are achieved is around 35 years or older 7 .The average age for first-time ultra-marathoners has remained unchanged in recent decades 71 .Individuals participating in an ultra-marathon were approximately 36 years old and had prior experience competing in shorter distances for approximately seven years 71 .The average age for first-time ultra-marathoners has remained unchanged in recent decades 71 .Several studies have analyzed the age of the best ultra-marathon performance 27,[72][73][74][75] , revealing that peak performance is generally achieved at an older age compared to the best performance in half-marathons and marathons 76 .For marathon racing, the best race time is typically achieved around the age of 30 77,78 , in ultramarathons, the age of best performance has generally been observed to be around 35 years or older 4,7,79,80 , with the age of peak ultra-marathon performance seemingly increasing as race distance increases.In particular, in 50-km ultra-marathon running, the best performance age is usually around 39-40 years 7 .It is easy to justify this considering that the peak performance is near 30 years old 81 and decline after 40 82 .
Furthermore, this finding might be explained in terms of the variation of participation by age group.It was observed that a much smaller number of runners was in the age group 20-24 years compared to the older age www.nature.com/scientificreports/groups.This difference in participation might indicate that this age group might be considered as a relatively more 'selective' than the older and more 'massive' age groups.In summary, our finding that the fastest 50-km ultra-marathoners were in the age group 20-24 years is unexpected.The analysis of our model shows that the average race speed decreases after the PDP peak for age group at 20-24 years continuously until approximately − 1.75 km/h for the age group 75+.This contradicts the general belief that peak performance in ultra-marathons is achieved at an older age.This suggests that younger runners may have an advantage in this race distance, and further research is needed to understand the factors contributing to this age group's success.

Conclusion
In conclusion, this study provides valuable insights into the country of affiliation and performance of the fastest 50-km ultra-marathoners.Runners from Ethiopia, Lesotho, Malawi, and Kenya emerged as the top performers in this race format, benefiting from genetic predisposition, traditional diets, high-altitude living and training, and sociocultural background.The fastest mean race times, on the other hand, were observed in Europe (Luxembourg, Belarus, and Lithuania) and the Middle East (Qatar and Jordan), attributed to flat racecourses, welldeveloped infrastructures, and favorable conditions.A surprising finding was that the fastest ultra-marathoners in the 50-km distance were in the age group of 20-24 years, challenging the notion of peak performance in older age groups for ultra-marathons.Further research is needed to understand the underlying factors contributing to the success of younger runners in this specific race distance.

Table 1 .
Athletes' country sorted by average (mean) race speed.Count (number of race records in each group), mean (average race speed of the race records in each group), std (standard deviation), min (minimum race speed in the group), max (maximum race speed in the group).

Table 2 .
List of event countries sorted by average (mean) running speed.Count (number of race records in each group), mean (average race speed of the race records in each group), std (standard deviation), min (minimum race speed in the group), max (maximum race speed in the group).